主动扬声器检测(ASD)问题的最新进展基于两个阶段的过程:特征提取和时空上下文集合。在本文中,我们提出了一个端到端的ASD工作流程,在其中共同学习特征学习和上下文预测。我们的端到端可训练网络同时学习了多模式的嵌入和汇总时空上下文。这会导致更合适的功能表示,并改善了ASD任务的性能。我们还介绍了交织的图神经网络(IGNN)块,该块根据ASD问题中的上下文主要来源分割消息。实验表明,IGNN块的汇总特征更适合ASD,从而导致最先进的性能。最后,我们设计了一种弱监督的策略,该策略表明也可以通过使用视听数据来解决ASD问题,但仅依赖于音频注释。我们通过对音频信号与可能的声源(扬声器)之间的直接关系进行建模以及引入对比度损失来实现这一目标。该项目的所有资源将在以下网址提供:https://github.com/fuankarion/end-to-end-end-asd。
translated by 谷歌翻译
There is a dramatic shortage of skilled labor for modern vineyards. The Vinum project is developing a mobile robotic solution to autonomously navigate through vineyards for winter grapevine pruning. This necessitates an autonomous navigation stack for the robot pruning a vineyard. The Vinum project is using the quadruped robot HyQReal. This paper introduces an architecture for a quadruped robot to autonomously move through a vineyard by identifying and approaching grapevines for pruning. The higher level control is a state machine switching between searching for destination positions, autonomously navigating towards those locations, and stopping for the robot to complete a task. The destination points are determined by identifying grapevine trunks using instance segmentation from a Mask Region-Based Convolutional Neural Network (Mask-RCNN). These detections are sent through a filter to avoid redundancy and remove noisy detections. The combination of these features is the basis for the proposed architecture.
translated by 谷歌翻译
Ithaca is a Fuzzy Logic (FL) plugin for developing artificial intelligence systems within the Unity game engine. Its goal is to provide an intuitive and natural way to build advanced artificial intelligence systems, making the implementation of such a system faster and more affordable. The software is made up by a C\# framework and an Application Programming Interface (API) for writing inference systems, as well as a set of tools for graphic development and debugging. Additionally, a Fuzzy Control Language (FCL) parser is provided in order to import systems previously defined using this standard.
translated by 谷歌翻译
Quantum Machine Learning (QML) shows how it maintains certain significant advantages over machine learning methods. It now shows that hybrid quantum methods have great scope for deployment and optimisation, and hold promise for future industries. As a weakness, quantum computing does not have enough qubits to justify its potential. This topic of study gives us encouraging results in the improvement of quantum coding, being the data preprocessing an important point in this research we employ two dimensionality reduction techniques LDA and PCA applying them in a hybrid way Quantum Support Vector Classifier (QSVC) and Variational Quantum Classifier (VQC) in the classification of Diabetes.
translated by 谷歌翻译
Uncertainty quantification is crucial to inverse problems, as it could provide decision-makers with valuable information about the inversion results. For example, seismic inversion is a notoriously ill-posed inverse problem due to the band-limited and noisy nature of seismic data. It is therefore of paramount importance to quantify the uncertainties associated to the inversion process to ease the subsequent interpretation and decision making processes. Within this framework of reference, sampling from a target posterior provides a fundamental approach to quantifying the uncertainty in seismic inversion. However, selecting appropriate prior information in a probabilistic inversion is crucial, yet non-trivial, as it influences the ability of a sampling-based inference in providing geological realism in the posterior samples. To overcome such limitations, we present a regularized variational inference framework that performs posterior inference by implicitly regularizing the Kullback-Leibler divergence loss with a CNN-based denoiser by means of the Plug-and-Play methods. We call this new algorithm Plug-and-Play Stein Variational Gradient Descent (PnP-SVGD) and demonstrate its ability in producing high-resolution, trustworthy samples representative of the subsurface structures, which we argue could be used for post-inference tasks such as reservoir modelling and history matching. To validate the proposed method, numerical tests are performed on both synthetic and field post-stack seismic data.
translated by 谷歌翻译
Understanding 3D environments semantically is pivotal in autonomous driving applications where multiple computer vision tasks are involved. Multi-task models provide different types of outputs for a given scene, yielding a more holistic representation while keeping the computational cost low. We propose a multi-task model for panoptic segmentation and depth completion using RGB images and sparse depth maps. Our model successfully predicts fully dense depth maps and performs semantic segmentation, instance segmentation, and panoptic segmentation for every input frame. Extensive experiments were done on the Virtual KITTI 2 dataset and we demonstrate that our model solves multiple tasks, without a significant increase in computational cost, while keeping high accuracy performance. Code is available at https://github.com/juanb09111/PanDepth.git
translated by 谷歌翻译
Detecting anomalous data within time series is a very relevant task in pattern recognition and machine learning, with many possible applications that range from disease prevention in medicine, e.g., detecting early alterations of the health status before it can clearly be defined as "illness" up to monitoring industrial plants. Regarding this latter application, detecting anomalies in an industrial plant's status firstly prevents serious damages that would require a long interruption of the production process. Secondly, it permits optimal scheduling of maintenance interventions by limiting them to urgent situations. At the same time, they typically follow a fixed prudential schedule according to which components are substituted well before the end of their expected lifetime. This paper describes a case study regarding the monitoring of the status of Laser-guided Vehicles (LGVs) batteries, on which we worked as our contribution to project SUPER (Supercomputing Unified Platform, Emilia Romagna) aimed at establishing and demonstrating a regional High-Performance Computing platform that is going to represent the main Italian supercomputing environment for both computing power and data volume.
translated by 谷歌翻译
Spacecraft pose estimation is a key task to enable space missions in which two spacecrafts must navigate around each other. Current state-of-the-art algorithms for pose estimation employ data-driven techniques. However, there is an absence of real training data for spacecraft imaged in space conditions due to the costs and difficulties associated with the space environment. This has motivated the introduction of 3D data simulators, solving the issue of data availability but introducing a large gap between the training (source) and test (target) domains. We explore a method that incorporates 3D structure into the spacecraft pose estimation pipeline to provide robustness to intensity domain shift and we present an algorithm for unsupervised domain adaptation with robust pseudo-labelling. Our solution has ranked second in the two categories of the 2021 Pose Estimation Challenge organised by the European Space Agency and the Stanford University, achieving the lowest average error over the two categories.
translated by 谷歌翻译
With the increase in health consciousness, noninvasive body monitoring has aroused interest among researchers. As one of the most important pieces of physiological information, researchers have remotely estimated the heart rate (HR) from facial videos in recent years. Although progress has been made over the past few years, there are still some limitations, like the processing time increasing with accuracy and the lack of comprehensive and challenging datasets for use and comparison. Recently, it was shown that HR information can be extracted from facial videos by spatial decomposition and temporal filtering. Inspired by this, a new framework is introduced in this paper to remotely estimate the HR under realistic conditions by combining spatial and temporal filtering and a convolutional neural network. Our proposed approach shows better performance compared with the benchmark on the MMSE-HR dataset in terms of both the average HR estimation and short-time HR estimation. High consistency in short-time HR estimation is observed between our method and the ground truth.
translated by 谷歌翻译
In this paper, we address the problem of multimodal emotion recognition from multiple physiological signals. We demonstrate that a Transformer-based approach is suitable for this task. In addition, we present how such models may be pretrained in a multimodal scenario to improve emotion recognition performances. We evaluate the benefits of using multimodal inputs and pre-training with our approach on a state-ofthe-art dataset.
translated by 谷歌翻译